北京邮电大学学报

  • EI核心期刊

北京邮电大学学报

• 论文 • 上一篇    下一篇

数据挖掘网格中决策树并行算法设计及性能分析

陈平;乔秀全;刘臻;田小萍   

  1. 1北京师范大学 信息网络中心; 2北京邮电大学 网络与交换技术国家重点实验室
  • 收稿日期:2009-04-13 修回日期:1900-01-01 出版日期:2009-04-28 发布日期:2009-04-28
  • 通讯作者: 陈平
  • 基金资助:
     

〗Design and Performance Analysis of a Parallel Decision Tree Algorithm on Data Mining Grid

    

  1.  
  • Received:2009-04-13 Revised:1900-01-01 Online:2009-04-28 Published:2009-04-28
  • Supported by:
     

摘要: 提出了C4.5决策树算法的一种并行算法,使传统的串行分类算法能在多台PC机和服务器组成的数据挖掘网格上并行数据挖掘. 采用数据纵横剖分,结合递归过程的并行化,实现了可扩展的高性能并行计算,解决了处理海量数据时没有较好并行分类算法的问题. 并给出了指导该并行算法高效计算的方法. 数据运行试验和算法分析表明,该并行算法的性能受多个因素影响,并具有高效的并行效率计算加速比.

关键词: 数据挖掘, 网格计算, 决策树, 并行性能

Abstract: Working on the group of personalcomputers and servers, a parallel C4.5 decision tree algorithm is proposed. This algorithm made the parallel date mining run on the data mining grid efficiently. A partition of vertical and horizontal method is introduced to parallel the procedure of recursive algorithm. The algorithm is scalable and solves the situation of lack of efficient parallel algorithm so far. The analysis and experiment for the parallel decision tree prove that the computing efficiency is affected by several parameters and the algorithm has high performance and high computing speedup. Guides to enhance the efficiency are proposed as well.

Key words: data mining, grid computing, decision tree, parallel performance

中图分类号: